48 research outputs found
On density-based data streams clustering algorithms: A survey
Clustering data streams has drawn lots of attention in the few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the number of clusters in advance. Due to data streams characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density-based methods in the clustering process and at the same time overcoming the constraints, which are put out by data stream’s nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms’ performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms
Multi-sensor fusion based on multiple classifier systems for human activity identification
Multimodal sensors in healthcare applications have been increasingly researched because it facilitates automatic and comprehensive monitoring of human behaviors, high-intensity sports management, energy expenditure estimation, and postural detection. Recent studies have shown the importance of multi-sensor fusion to achieve robustness, high-performance generalization, provide diversity and tackle challenging issue that maybe difficult with single sensor values. The aim of this study is to propose an innovative multi-sensor fusion framework to improve human activity detection performances and reduce misrecognition rate. The study proposes a multi-view ensemble algorithm to integrate predicted values of different motion sensors. To this end, computationally efficient classification algorithms such as decision tree, logistic regression and k-Nearest Neighbors were used to implement diverse, flexible and dynamic human activity detection systems. To provide compact feature vector representation, we studied hybrid bio-inspired evolutionary search algorithm and correlation-based feature selection method and evaluate their impact on extracted feature vectors from individual sensor modality. Furthermore, we utilized Synthetic Over-sampling minority Techniques (SMOTE) algorithm to reduce the impact of class imbalance and improve performance results. With the above methods, this paper provides unified framework to resolve major challenges in human activity identification. The performance results obtained using two publicly available datasets showed significant improvement over baseline methods in the detection of specific activity details and reduced error rate. The performance results of our evaluation showed 3% to 24% improvement in accuracy, recall, precision, F-measure and detection ability (AUC) compared to single sensors and feature-level fusion. The benefit of the proposed multi-sensor fusion is the ability to utilize distinct feature characteristics of individual sensor and multiple classifier systems to improve recognition accuracy. In addition, the study suggests a promising potential of hybrid feature selection approach, diversity-based multiple classifier systems to improve mobile and wearable sensor-based human activity detection and health monitoring system. - 2019, The Author(s).This research is supported by University of Malaya BKP Special Grant no vote BKS006-2018.Scopu
Bayesian nonparametric models for name disambiguation and supervised learning
This thesis presents new Bayesian nonparametric models and approaches for their development,
for the problems of name disambiguation and supervised learning. Bayesian
nonparametric methods form an increasingly popular approach for solving problems
that demand a high amount of model flexibility. However, this field is relatively new,
and there are many areas that need further investigation. Previous work on Bayesian
nonparametrics has neither fully explored the problems of entity disambiguation and
supervised learning nor the advantages of nested hierarchical models. Entity disambiguation
is a widely encountered problem where different references need to be linked
to a real underlying entity. This problem is often unsupervised as there is no previously
known information about the entities. Further to this, effective use of Bayesian
nonparametrics offer a new approach to tackling supervised problems, which are frequently
encountered.
The main original contribution of this thesis is a set of new structured Dirichlet process
mixture models for name disambiguation and supervised learning that can also
have a wide range of applications. These models use techniques from Bayesian statistics,
including hierarchical and nested Dirichlet processes, generalised linear models,
Markov chain Monte Carlo methods and optimisation techniques such as BFGS. The
new models have tangible advantages over existing methods in the field as shown with
experiments on real-world datasets including citation databases and classification and
regression datasets.
I develop the unsupervised author-topic space model for author disambiguation that
uses free-text to perform disambiguation unlike traditional author disambiguation approaches.
The model incorporates a name variant model that is based on a nonparametric
Dirichlet language model. The model handles both novel unseen name variants and
can model the unknown authors of the text of the documents. Through this, the model
can disambiguate authors with no prior knowledge of the number of true authors in the
dataset. In addition, it can do this when the authors have identical names.
I use a model for nesting Dirichlet processes named the hybrid NDP-HDP. This
model allows Dirichlet processes to be clustered together and adds an additional level of
structure to the hierarchical Dirichlet process. I also develop a new hierarchical extension
to the hybrid NDP-HDP. I develop this model into the grouped author-topic model
for the entity disambiguation task. The grouped author-topic model uses clusters to model the co-occurrence of entities in documents, which can be interpreted as research
groups. Since this model does not require entities to be linked to specific words in a
document, it overcomes the problems of some existing author-topic models. The model
incorporates a new method for modelling name variants, so that domain-specific name
variant models can be used.
Lastly, I develop extensions to supervised latent Dirichlet allocation, a type of supervised
topic model. The keyword-supervised LDA model predicts document responses
more accurately by modelling the effect of individual words and their contexts directly.
The supervised HDP model has more model flexibility by using Bayesian nonparametrics
for supervised learning. These models are evaluated on a number of classification
and regression problems, and the results show that they outperform existing supervised
topic modelling approaches. The models can also be extended to use similar information
to the previous models, incorporating additional information such as entities and
document titles to improve prediction
Bayesian Painting by Numbers: Flexible Priors for Colour-Invariant Object Recognition
Generative models of images should take into account transformations of geometry and reflectance. Then, they can provide explanations of images that are factorized into intrinsic properties that are useful for subsequent tasks, such as object classification. It was previously shown how images and objects within images could be described as compositions of regions called structural elements or ‘stels’. In this way, transformations of the reflectance and illumination of object parts could be accounted for using a hidden variable that is used to ‘paint’ the same stel differently in different images. For example, the stel corresponding to the petals of a flower can be red in one image and yellow in another. Previous stel models have used a fixed number of stels per image and per image class. Here, we introduce a Bayesian stel model, the colour − invariant admixture (CIA) model, which can infer different numbers of stels for different object types, as appropriate. Results on Caltech101 images show that this method is capable of automatically selecting a number of stels that reflects the complexity of the object class and that these stels are useful for object recognition.Engineering and Applied Science
Data mining techniques using decision tree model in materialised projection and selection view
With the availability of very large data storage today, redundant data
structures are no longer a big issue. However, an intelligent way of managing
materialised projection and selection views that can lead to fast access of
data is the central issue dealt with in this paper. A set of implementation
steps for the data warehouse administrators or decision makers to improve
the response time of queries is also defined. The study concludes that both
attributes and tuples, are important factors to be considered to improve the
response time of a query. The adoption of data mining techniques in the
physical design of data warehouses has been shown to be useful in practice
Hybrid variational / gibbs collapsed inference in topic models
Contains fulltext :
69933.pdf (author's version ) (Open Access)24th Conference in Uncertainty in Artificial Intelligence, July 9-12, 2008, Helsinki, Finland, 09 juli 200
Improving word sense disambiguation using topic features
EMNLP-CoNLL 2007 - Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning1015-102